If you think that spatial data is just a set of longitude and latitude observations for a given area of geographic space, and that it should be treated as such regardless, you’re missing out on opportunities and have a high chances of incurring in meaningless analysis. Spatial data is not, in fact, just a set of records of observable events and/or facts in geographical space. They are the (potential) starting point for answering relevant questions about various problems in a given reality.
In the New Era of Big Data, where decision-making is increasingly data-driven, Spatial Data Science has emerged as an essential field, with applications spanning many sectors of society (e.g. agriculture, environmental science, urban planning, clean energy generation, sanitation, water management, and much more.
For these and many other reasons, I challenge you to reflect on your convictions about Spatial Data, this important entity in the context of Big Data. A lot of things has changed, is changing and will change faster and faster, especially due to the advance of visual and computational technologies. No matter what your area of interest in this field, these changes presuppose, and to some extent demand, that you develop new skills.
At some point in your life you’ve heard someone say that “R is a powerful tool”. This is true, and you don’t need to make a big effort to find n reasons why it is. WWhen I wrote this notebook (using R Markdown), I did not it thinking about explaining what R is and how it works. The purpose of this notebook goes beyond this part, which I assume you already know. After all, my goal with this notebook is to demonstrate, with one possibility, how you can use R to start your studies in Spatial Data Science.
In this section, I start using spatial data about the occurrence (point geometries) of land degraded by erosion processes in natural and antropogenic ecosystems in a watershed located in the Brazilian Semiarid Region. Specifically, the data was collected and organized by me, after successive efforts in various field works in the state of Bahia, between the year 2022 and 2024.
“Land degradation is understood as the reduction of soil capacity to generate assets and services, in qualitative and quantitative terms, due to the decline of its productive potential and of its environmental regulation capacity” (LAL, 2012). From this perspective, land degradation is a complex process, associated with natural factors and human activities. According to Silva (2023), the land degradation is closely related to landscape degradation, considering the soil-landscape relationship.
On figure below I present a record (21-06-2023/17:06pm), organized by me, of an area highly degraded by an erosive process (Ravina), with a strong impact in landscape.
To reproduce this document, you need to have R installed on your machine, as well RStudio. RStudio is an Integrated Development Environment (IDE) for R.
In the chunk below you will set your Working Directory. To do this
you use setwd( ) and pass the Pass the address.
setwd("D:/Documents/Music/PROJECTSR/SpatialDataScienceinR")
In the chunk below you will install the necessary packages. As I
already have these packages installed, note that I left the command as a
comment, adding the #. If you don’t have these packages
installed, just remove the # and run. To install a
package you use install.packages( ).
The first package to install is leafleat, a
visualization tool that is highly recommended for those who want
Interactive Maps.
The second package to install is sf (Simple Features)
package, a very useful feature for those who work with maps, very
interesting for representing and working with spatial vector data
including points, polygons, and lines.
#install.packages("leafleat")
#install.packages("sf")
Once installed, in the chunk below you will load the installed
packages. All you have to do is use library( ).
library(leaflet)
library(sf)
Here, in the chunk below, you will read the spatial data, in this case, a point geometries (shp.). Note that it will be stored in a variable named shape.
shape <- st_read('D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/SpatialData/LandDegradation.shp')
## Reading layer `LandDegradation' from data source
## `D:\Documents\Music\PROJECTSR\SpatialDataScienceinR\SpatialData\LandDegradation.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 2146 features and 12 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: -40.69852 ymin: -11.37433 xmax: -40.67941 ymax: -11.34879
## Geodetic CRS: WGS 84
In the chunk above, you can observe the properties of the data. The printed output shows:
the (selected) dataset has 2146 features (records) and 12 fields (attributes);
the geometry type is MULTIPOLYGON;
it has dimension XY, indicating that each point will consist of 2 coordinate values
the range of X and Y values of the geometry;
The Bounding box;
the coordinate reference system (CRS) is geodetic, with coordinates in degrees longitude and latitude.
In the chunk below you can create your first interactive map using
the leaflet package. In this situation, your interactive
map show the information of land degradation occurrence. Note that, the
infomation is presented by cluster. If you zoom in, you’ll see that the
clusters are gradually dissolving. This is a very interesting approach
when you have a large number of points that can visually pollute the
map.
leaflet(shape) %>%
setView(lat = -11.351051, lng = -40.684630, zoom = 16) %>%
addProviderTiles(providers$Esri.WorldImagery, group = "Esri.WorldImagery") %>%
addProviderTiles(providers$OpenStreetMap, group = "Open Street Map") %>%
addLayersControl(baseGroups = c("ESRI World Imagery", "Open Street Map"),
options = layersControlOptions(collapsed = TRUE)) %>%
addMarkers(clusterOptions = markerClusterOptions(),
label = shape$Tipo,
labelOptions = labelOptions(
textsize = "14px",
style = list(
"font-weight" = "bold",
padding = "5px",
lat = ~Lat, lng = ~Long)))
In the chunk below you can create your second interactive map using
the leaflet package. Note that this time the information is
categorized by the Tipo field (Type of Erosion), and is
also associated with colors (Yellow, Orange and Red) that reflect the
severity of the process.
Tipo <- c("Boçoroca", "Ravina", "Sulcos")
my_palette <- c("red", "orange", "yellow")
#previewColors(colorFactor(my_palette, levels = Tipo), Tipo)
factpal <- colorFactor(my_palette, levels = Tipo)
groups = unique(shape$Tipo)
map = leaflet(shape) %>%
setView(lat = -11.351051, lng = -40.684630, zoom = 17.3) %>%
addProviderTiles(providers$Esri.WorldImagery, group = "Esri.WorldImagery")
for (i in groups) {
data = shape[shape$Tipo == i, ]
map = map %>% addCircleMarkers(data = data, radius = 4,
weight = 2, opacity = 1, fill = TRUE, fillOpacity = 0.2, color = ~factpal(Tipo), group = i)}
map %>% addLayersControl(overlayGroups = groups, options = layersControlOptions(collapsed = FALSE))
A few background questions:
In this section, I will use spatial data (Polygon geometries) about the population density in a neighborhood of Formosa City, Goiás, Brazil. Specifically, the raw data describing the number of inhabitants at the census sector scale was collected and made available digitally by the Brazilian Institute of Geography and Statistics (IBGE). It was therefore up to me to acquire, organize and process them (I did it at an previous stage using resources of the GIS/QGIs environment) in order to arrive at the population density.
Population density is very useful information in a variety of situations involving decision-making in cities, but not only E. g., population density information may be relevant for identifying, mapping and monitoring sectors where there is a high risk of COVID-19 infection.
On figure below I present a record (13-02-2024/14:19pm), organized by me, of an street located in the Formosinha neighborhood, in the city of Formosa, Goiás, Brazil. Looking at the context (presence of buildings, arrangement of the houses…), you have indications that this is a sector where there is a considerable population density.
library(leaflet)
library(sf)
library(RColorBrewer)
library(classInt)
Here, in the chunk below, you will read the spatial data, in this case, a polygon geometries (shp.). Note that it will be stored in a variable named Population.
Population <- st_read('D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/SpatialData/PopulationDensity.shp')
## Reading layer `PopulationDensity' from data source
## `D:\Documents\Music\PROJECTSR\SpatialDataScienceinR\SpatialData\PopulationDensity.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 21 features and 4 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: -47.33796 ymin: -15.56236 xmax: -47.32025 ymax: -15.54231
## Geodetic CRS: WGS 84
If your data is in another SRC, on chunk below you can transform to leaflet projection. In this particular case, this does not apply because the data being read is already in the correct SRC.
#Population <- st_transform(Population, crs = '+proj=longlat +datum=WGS84')
Once this is understood, with the chunk below command you can plot the interactive map.
leaflet() %>%
addProviderTiles(providers$Esri.WorldImagery, group = "Esri.WorldImagery") %>%
addProviderTiles(providers$OpenStreetMap, group = "Open Street Map") %>%
addLayersControl(baseGroups = c("ESRI World Imagery", "Open Street Map"),
options = layersControlOptions(collapsed = TRUE)) %>%
setView(lng = -47.33, lat = -15.55, zoom = 14) %>%
addMiniMap(width = 150, height = 150) %>%
addPolygons(data = Population, color = "black", stroke = 1, opacity = 0.8)
With the chunk command below, set the parameters to create a map categorized by the population density of the neighborhood.
pal_fun <- colorQuantile("YlOrRd", NULL, n = 5)
p_density <- paste0("<strong>Densid_Ha: </strong>", Population$Densid_Ha)
breaks_qt <- classIntervals(c(min(Population$Densid_Ha) - .00001, Population$Densid_Ha), n = 5, style = "jenks")
Considering the defined parameters, here you can plot and see the interactive map:
leaflet(Population) %>%
addPolygons(
stroke = TRUE,
color = "black",
fillColor = ~pal_fun(Densid_Ha),
fillOpacity = 0.8, smoothFactor = 0.5,
popup = p_density,
group = "Population") %>%
addTiles(group = "OSM") %>%
addProviderTiles("Esri.WorldImagery", group = "ESRI World Imagery") %>%
setView(lng = -47.33, lat = -15.55, zoom = 14) %>%
addLegend("bottomright",
colors = brewer.pal(5, "YlOrRd"),
labels = paste0("Up to", format(breaks_qt$brks[-1], digits = 2)),
title = 'Population Density (h/ha)') %>% addLayersControl(baseGroups = c("OSM", "ESRI World Imagery"),
overlayGroups = c("Population"))
As obvious as it may seem, when working with spatial data, ALWAYS REMEMBER: data can’t explain itself, because it can’t talk; the map doesn’t show/indicate/suggest, because the map doesn’t have a life of its own. YOU are expected to do this, and what’s more, YOU are expected to see beyond the data. The limited set of data we are working with is not always enough to answer our questions, it is up to you to know the limits, the power of generalization of the information.
Thank you!
Raster are a particular form of data on Spatial Data Science.
In this section, I will use spatial data (Raster) on Cerrado Remnants in the Ribeirão João Leite Watershed, located in the Metropolitan Region of Goiânia. This data is authored by Silva and Nascimento (2024). In this paper, authors demonstrate the potential of Earth Observation data (obtained by Orbital Remote Sensing) for mapping Cerrado Remnants, providing crucial information for Environmental Planning and Integrated Territorial Management at the watershed scale.
The Brazilian Cerrado Remnants are one of the critical points for its conservation (MYERS et al., 2000). The total or partial disappearance of the continuous and natural landscapes of the Brazilian Cerrado is gradually affecting its biodiversity and important ecosystem services, such as the regulation of the carbon balance and the availability of water (FRANÇOSO et al., 2021; SANTOS et al., 2021). Cerrado Remnants play a crucial role in providing ecosystem services, such as regulating the climate, reducing surface runoff, helping to control soil erosion, protecting and contributing to the balance of aquatic ecosystems, and others (HOFMANN et al., 2021; HOFMANN et al., 2023. Despite this, studies show that in 2022 the suppression of natural vegetation cover in the Brazilian Cerrado increased by 25% compared to 2021, indicating that more than 50% of its remnants have already been suppressed (BARBOSA, 2011; MESSIAS et al., 2022).
Day after day, official data record that the continuous landscapes of natural vegetation cover in the Brazilian Cerrado continue to be intensely fragmented and are increasingly limited to fragments that are not only isolated from each other, but also pressured by the anthropogenic activities that take place around them (MOTA JUNIOR, SILVA and SILVA, 2021; SILVA, SILVA and SOUZA, 2022; MOTA JUNIOR, TRENTIN and SILVA 2023; CHAVES et al., 2023). Assessing the environmental quality of these Cerrado remnants is therefore vital to understanding their current condition and providing support for their conservation (LONGO et al., 2024; OLIVEIRA, SILVA and NASCIMENTO, 2024).
I leave you with an question:
what are the challenges and opportunities, today, for the conservation of Cerrado Remnants in the João Leite watershed, one of the main sources of water for supplying the population of the Goiânia Metropolitan Region?
In the chunk below you will set your Working Directory. To do this
you use setwd( ) and pass the Pass the address.
setwd("D:/Documents/Music/PROJECTSR/SpatialDataScienceinR")
In the chunk below you will install the necessary packages.
Specifically, the library(sf) will make it possible to load
and read the vector data (Polygon) relating to the boundaries
of the Ribeirão João Leite Watershed. On the other hand, the
library(raster) will allow you to load and read the matrix
data (Raster) referring to the Cerrado Remnants that were
mapped by Silva
and Nascimento (2024).
The library(raster) is fundamental for manipulating and
analyzing spatial raster data in R. It provides an efficient data
structure for dealing with large sets of raster data, such as satellite
images, digital elevation models (DEM), climate data, and more. Finally,
library(mapview) will provide us with an interactive
interface for visualizing the spatial data in question.
Maybe library(mapview) is a ‘little more’ practical than
library(leaflet). What do you think? The fact is that
library(mapview) makes it significantly easier to create
interactive and exploratory maps directly in the environment R Markdown. That’s what I
think, anyway.